VALL-E X - NISHIO Hirokazu's Scrapbox (Auto-translated from Japanese)

VALL-E X

[VALL-E X, which can synthesize Japanese, English, and Chinese with a voice that sounds exactly like the user if given three seconds of audio, is still a threat; I tried and felt the OSS version of the technology that MS has made private (CloseBox) | Techno Edge TechnoEdge https://www.techno-edge.net/ article/2023/08/28/1812.html]

VALL-E-X : A speech synthesis model that can change voice quality without re-training. VALL-E-X is a speech synthesis model that can change voice quality without the need for retraining. AD%A6%E7%BF%92%E4%B8%8D%E8%A6%81%E3%81%A7%E5%A3%B0%E8%B3%AA%E3%82%92%E5%A4%89%E6%9B%B4%E3%81%A7%E3%81%8D%E3%82%8B%E9%9F%B3%E5%A3%B 0%E5%90%88%E6%88%90%E3%83%A2%E3%83%87%E3%83%AB-977efc19ac84

Text to Speech (TTS), voice cloning (using VALL-E X, Python, and PyTorch) given as a prompt (on Windows)

Try VALL-E-X with Orange Pi 5 | WASP Corporation

---

This page is auto-translated from /nishio/VALL-E X using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.